32 research outputs found

    Evaluating the limits of network topology inference via virtualized network emulation

    Get PDF
    The Internet measurement community is beset by a lack of ground truth, or knowledge of the real, underlying network in topology inference experiments. While better tools and methodologies can be developed, quantifying the effectiveness of these mapping utilities and explaining pathologies is difficult, if not impossible, without knowing the network topology being probed. In this thesis we present a tool that eliminates topological uncertainty in an emulated, virtualized environment. First, we automatically build topological ground truth according to various network generation models and create emulated Cisco router networks by leveraging and modifying existing emulation software. We then automate topological inference from one vantage point at a time for every vantage point in the network. Finally, we incorporate a mechanism to study common sources of network topology inference abnormalities by including the ability to induce link failures within the network. In addition, this thesis reexamines previous work in sampling Autonomous System-level Internet graphs to procure realistic models for emulation and simulation. We build upon this work by including additional data sets, and more recent Internet topologies to sample from, and observe divergent results from the authors of the original work. Lastly, we introduce a new technique for sampling Internet graphs that better retains particular graph metrics across multiple timeframes and data sets.http://archive.org/details/evaluatinglimits1094545932Outstanding ThesisOutstanding ThesisCaptain, United States Marine CorpsApproved for public release; distribution is unlimited

    The Constrained Maximal Expression Level Owing to Haploidy Shapes Gene Content on the Mammalian X Chromosome.

    Get PDF
    X chromosomes are unusual in many regards, not least of which is their nonrandom gene content. The causes of this bias are commonly discussed in the context of sexual antagonism and the avoidance of activity in the male germline. Here, we examine the notion that, at least in some taxa, functionally biased gene content may more profoundly be shaped by limits imposed on gene expression owing to haploid expression of the X chromosome. Notably, if the X, as in primates, is transcribed at rates comparable to the ancestral rate (per promoter) prior to the X chromosome formation, then the X is not a tolerable environment for genes with very high maximal net levels of expression, owing to transcriptional traffic jams. We test this hypothesis using The Encyclopedia of DNA Elements (ENCODE) and data from the Functional Annotation of the Mammalian Genome (FANTOM5) project. As predicted, the maximal expression of human X-linked genes is much lower than that of genes on autosomes: on average, maximal expression is three times lower on the X chromosome than on autosomes. Similarly, autosome-to-X retroposition events are associated with lower maximal expression of retrogenes on the X than seen for X-to-autosome retrogenes on autosomes. Also as expected, X-linked genes have a lesser degree of increase in gene expression than autosomal ones (compared to the human/Chimpanzee common ancestor) if highly expressed, but not if lowly expressed. The traffic jam model also explains the known lower breadth of expression for genes on the X (and the Z of birds), as genes with broad expression are, on average, those with high maximal expression. As then further predicted, highly expressed tissue-specific genes are also rare on the X and broadly expressed genes on the X tend to be lowly expressed, both indicating that the trend is shaped by the maximal expression level not the breadth of expression per se. Importantly, a limit to the maximal expression level explains biased tissue of expression profiles of X-linked genes. Tissues whose tissue-specific genes are very highly expressed (e.g., secretory tissues, tissues abundant in structural proteins) are also tissues in which gene expression is relatively rare on the X chromosome. These trends cannot be fully accounted for in terms of alternative models of biased expression. In conclusion, the notion that it is hard for genes on the Therian X to be highly expressed, owing to transcriptional traffic jams, provides a simple yet robustly supported rationale of many peculiar features of X's gene content, gene expression, and evolution

    Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

    Get PDF
    Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism

    EUI-64 Considered Harmful

    Get PDF
    This position paper considers the privacy and security implications of EUI-64-based IPv6 addresses. By encoding MAC addresses, EUI-64 addresses violate layers by exposing hardware identifiers in IPv6 addresses. The hypothetical threat of EUI-64 addresses is well-known, and the adoption of privacy extensions in operating systems (OSes) suggests this vulnerability has been mitigated. Instead, our work seeks to quantify the empirical existence of EUI-64 IPv6 addresses in today’s Internet. By analyzing: i) traceroutes; ii) DNS records; and iii) mobile phone behaviors, we find surprisingly significant use of EUI-64. We characterize the origins and behaviors of these EUI-64 IPv6 addresses, and advocate for changes in provider IPv6 addressing policies

    Revisiting AS-Level Graph Reduction

    Get PDF
    The topological structure of the Internet – the interconnection of routers and autonomous systems (ASes) – is large and complex. Frequently it is necessary to evaluate network protocols and applications on “Internet-like” graphs in order understand their security, resilience, and performance properties. A fundamental obstacle to emulation and simulation is creating realistic Internet-like topologies of reduced order. We reexamine existing AS graph reduction algorithms and find that they struggle to capture graph theoretic properties of modern topologies and topologies obtained from different sources. We develop a new AS graph reduction method that performs well across time periods and data sets.NSF CNS-1213155DHS Cyber Security N66001-2250-5823

    The Neighbor Matrix: Generalizing A Graph’s Degree Sequence

    Get PDF
    The newly introduced neighborhood matrix extends the power of adjacency and distance matrices to describe the topology of graphs. The adjacency matrix enumerates which pairs of vertices share an edge and it may be summarized by the degree sequence, a list of the adjacency matrix row sums. The distance matrix shows more information, namely the length of shortest paths between vertex pairs. We introduce and explore the neighborhood matrix, which we have found to be an analog to the distance matrix what the degree sequence is to the adjacency matrix. The neighbor matrix includes the degree sequence as its first column and the sequence of all other distances in the graph up to the graph’s diameter, enumerating the number of neighbors each vertex has at every distance present in the graph. We prove this matrix to contain eleven oft-used graph statistics and topological descriptors. We also provide insight into two applications that show potential utility of the neighbor matrix in comparing graphs and identifying topologically significant vertices in a graph.This work was partially funded by a grant from the Department of Defense

    Discovering the IPv6 Network Periphery

    Get PDF
    The article of record as published may be found at https://doi.org/10.1007/978-3-030-44081-7_1We consider the problem of discovering the IPv6 network periphery, i.e., the last hop router connecting endhosts in the IPv6 Internet. Finding the IPv6 periphery using active probing is challenging due to the IPv6 address space size, wide variety of provider addressing and subnetting schemes, and incomplete topology traces. As such, existing topology mapping systems can miss the large footprint of the IPv6 periphery, disadvantaging applications ranging from IPv6 census studies to geolocation and network resilience. We introduce “edgy,” an approach to explicitly discover the IPv6 network periphery, and use it to find > 64M IPv6 periphery router addresses and > 87M links to these last hops – several orders of magnitude more than in currently available IPv6 topologies. Further, only 0.2% of edgy’s discovered addresses are known to existing IPv6 hitlists.This work supported in part by NSF grant CNS-1855614.This work supported in part by NSF grant CNS-1855614
    corecore